Some convergently three-term trust region conjugate gradient algorithms under gradient function non-Lipschitz continuity

This paper introduces two three-term trust region conjugate gradient algorithms, TT-TR-WP and TT-TR-CG, which are capable of converging under non-Lipschitz continuous gradient functions without any additional conditions. These algorithms possess sufficient descent and trust region properties, and demonstrate global convergence. In order to assess their numerical performance, we compare them with two classical algorithms in terms of restoring noisy gray-scale and color images as well as solving large-scale unconstrained problems. In restoring noisy gray-scale images, we set the performance of TT-TR-WP as the standard, then TT-TR-CG takes around 2.33 times longer. The other algorithms around 2.46 and 2.41 times longer, respectively. In solving the same color images, the proposed algorithms exhibit relative good performance over other algorithms. Additionally, TT-TR-WP and TT-TR-CG are competitive in unconstrained problems, and the former has wide applicability while the latter has strong robustness. Moreover, the proposed algorithms are both more outstanding than the baseline algorithms in terms of applicability and robustness.

where the objective function h : R n −→ R is continuously differentiable.The conjugate gradient (CG) algorithm is widely used to solve (1), in which the iteration formula is written as: where x k+1 , α k and d k are next iteration point, step size and search direction respectively, where d k is generally defined by formula where g k is called the gradient of objective function h(x) at iteration point x k , and β k ∈ R is a scalar.Some CG algorithms are proposed to solve large-scale optimization problems and engineer problems.In Ref. 4 , general conjugate gradient method using the Wolfe line search is proposed, with a condition on the scalar β k , which is sufficient for the global convergence.In Ref. 16 , a projection-based method is proposed to solve large-scale nonlinear pseudo-monotone equations, without Lipschitz continuity.0][21] , Sheng et al. proposed some trust region algorithms to solve nonsmooth minimization, large-residual nonsmooth least squares problems and optimization problems.Yuan et al proposed some nonlinear conjugate gradient methods to restore nonlinear equations and image restorations in Ref. 24,25 .In Ref. 5 , Dai summarized some analysis of conjugate gradient method.In Ref. 9 , authors adopted conjugate gradient solvers on graphic processing units.In Ref. 12 , authors proposed a new conjugate gradient method with guaranteed descent and an efficient line search for optimization.
(1) min{h(x)| x ∈ R n }, (2) In Ref. 18 , authors proposed a hybrid conjugate gradient algorithm combining PRP and FR algorithms.In Ref. 23 , Wei et al proposed a conjugate gradient algorithm which designs a negative coefficient in the formula of the search direction.In fact, an important work is the design of β k , and some classical expressions are widely used, including the Hestenes-Stiefel (HS) 8,14,27 , Liu-Storey (LS) 22 , Polak-Ribière-Polyak (PRP) 11,25,26,28 , Dai-Yuan (DY) 6,29 and conjugate descent method (CD) 10,13 , Fletcher-Reeves (FR) 15 , where the first three algorithms have relatively good numerical performance but fewer theoretical results, while the others are inverse.The definitions are listed in Table 1, where . is the Euclidean norm.The primary components of conjugate gradient algorithms encompass the search direction, step size (when applicable), and global convergence.The ultimate objective is to achieve a satisfactory balance between numerical efficiency and theoretical scrutiny.
In fact, the adequate descent property is a prerequisite for theoretical analysis and is governed by the following equation where t > 0.Moreover, the trust region technique illustrates that the search radius plays a crucial role in determining the numerical efficacy.The search direction is obtained by solving the subsequent quadratic function, where k denotes the trust region radius.
The search direction in CG algorithms is also called satisfying the trust region property if following formula holds.
where t 1 > 0 .Equations ( 4) and ( 5) are intimately connected with the global convergence.Furthermore, an inexact linear search approach is frequently utilized to determine a suitable step size α k .This paper adopts weak Wolfe-Powell (WWP) inexact linear search, which is formulated as follows: and where δ ∈ (0, 1 2 ) and τ ∈ (δ, 1).The aforementioned discussions are intricately linked to global convergence, which necessitates certain fundamental assumptions.These include: (i) the objective function must be continuously differentiable; (ii) the level set S = {x ∈ R n : h(x) ≤ h(x 0 )} must be bounded; and (iii) the gradient function g(x) must be Lipschitz continuous, where x 0 denotes an initial point.The FR method 1 , modified HS method 7 , modified LS method 17 , and modified DY method 29 achieve global convergence through the formula In other words, the Lipschitz continuity of the gradient function is a prerequisite for existing works, prompting us to consider whether global convergence can be attained in the absence of Lipschitz continuity.This paper proposes some three-term trust region conjugate gradient methods that converge under non-Lipschitz continuity condition, with the main properties summarized as follows: • Objective algorithms possess both the sufficient descent and trust region properties, without any additional conditions.The trust region property is derived from the trust region algorithm, while the algorithm design is based on classical approaches such as Hestenes-Stiefel (HS) and Polak-Ribière-Polyak (PRP).• These algorithms achieve global convergence even under conditions of non-Lipschitz continuity of the gradient function and weak Wolfe-Powell linear search techniques.• The applications of these algorithms include image restoration of noisy gray scale and color images, as well as solving large-scale unconstrained problems.The case studies illustrate that TT-TR-WP and TT-TR-CG possess superior numerical performance. (4)

Motivation and TT-TR-WP
The first three-term conjugate gradient formula is proposed by Zhang et al. 30 , in which the search direction is defined by Formula (8) satisfies the sufficient descent property without any additional conditions, while the trust region property is closely related to the objective function, Lipschitz continuity, and level set.
Formula (9) was introduced by Yuan et al. 28 under the weak Wolfe-Powell linear search technique, where the search direction is given by the following expression: The step size α k−1 is included in the search direction (9).This formula not only satisfies the sufficient descent property without other conditions, but also guarantees global convergence under non-Lipschitz continuity conditions, while the trust region property is closely linked to the formula α k−1 d k−1 = x k − x k−1 , objective function, and level set.
To summarize, while formulas ( 8) and ( 9) do possess the sufficient descent property without additional conditions, there are several limitations.The trust region property, vital for both theoretical analysis and numerical performance, unfortunately depends on the objective function, basic assumptions, and complex analysis.Additionally, there exist simpler and more cost-effective algorithms that simultaneously achieve better numerical performance and theoretical results.
Aforementioned discussions inspire us to propose following formula.

Remark 1
(i) Formula (10) possesses the sufficient descent and trust region properties that are independent of any additional conditions.(ii) Global convergence is guaranteed even under conditions of non-Lipschitz continuity of the gradient function.(iii) The classical HS algorithm's excellent numerical performance is incorporated into TT-TR-WP through a specified denominator.

The global convergence of TT-TR-WP
This section analyzes the global convergence of TT-TR-WP, in which the properties of sufficient descent and trust region are firstly given.Lemma 3.1 The search direction (10) simultaneously has the sufficient descent (4) and trust region (5) properties, i.e., and www.nature.com/scientificreports/Proof If k = 0 , d 0 = −g 0 , and �d 0 � ≤ �g 0 � ≤ (1 + 2 σ )�g 0 �, If k ≥ 1 , following formulas can be obtained from the formula (10): and then completes the proof.

Remark 2
(i) The Lemma 3.1 proves the sufficient descent and trust region properties of search direction (10), which are independent of any assumptions and linear search techniques.(ii) From formula (11), we can obtain this means that thus following formula holds from formula (12)   To achieve global convergence, certain basic assumptions are proposed.

Assumption
(i) The level set S = {x|h(x) ≤ h(x 0 )} is well-defined and bounded, where x 0 is the initial point.(ii) The function h(x) is continuously differentiable and bounded below.
Under these assumptions, the following significant properties hold: Property 1: The iteration sequence {x k } is bounded.Property 2: The gradient function g(x) is continuous on the level set.Now pay attention to the global convergence of TT-TR-WP.

generated by TT-TR-WP, then, following formula holds
Proof We adopt proof by contradiction, and firstly make an assumption where ε C is a positive constant.
Additionally, there exists a convergent subsequence {x k i } since iteration point {x k } is bounded, it means that Similarly, the gradient function is continuous, thus there exists ǫ 1 > 0 and an integer N 1 > 0 such that (12) www.nature.com/scientificreports/From formula (13), there exists ǫ 2 > 0, and an integer N 2 > 0 satisfying From ( 16), ( 17) and ( 11), following formula holds On the other hand, following formula will be obtained from (7)   thus then taking the limit on both sides and set N = max{N 1 , N 2 }, with the subsequence {x k i }, we can deduce that It means that there exists a subsequence {x k i }, such that while this contradicts the relation (11), i.e. the original formula holds and the proof is completed.

TT-TR-CG and theoretical analysis
This section will propose the other modified three-term trust region CG algorithm, TT-TR-CG, and prove some properties.
In TT-TR-CG, the search direction has following form: where µ > 0. This subsection will firstly describe contents of objective algorithm.TT-TR-CG: A convergently three-term trust region CG with the weak Wolfe-Powell Step 0: Initialize Step 2: Choose step size α k under formulas ( 6) and (7).

Remark 4
(i) The search direction (19) satisfies both the sufficient descent and trust region properties simultaneously.(ii) Global convergence analysis is established under the gradient function non-Lipschitz continuity and weak Wolfe-Powell linear search technique.(iii) The good numerical performance of the classical PRP algorithm is partly incorporated into TT-TR-CG through the specified denominator.
To obtain the global convergence, some basic assumptions are proposed.

Assumption
(i) the level set S = {x|h(x) ≤ h(x 0 )} is defined and bounded, where x 0 is an initial point; (ii) the objective function h(x) is continuously differentiable and bounded below.

Case studies
This section utilises objective algorithms to restore noisy images and solve large-scale unconstrained optimisation problems to test their numerical performance.To further test the numerical performance, this paper introduces two baseline algorithms in Ref. 26,28 , namely MPRP and A-TPRP-A, and the formulas are ( 8), (9), respectively.The former is the first three-term conjugate gradient algorithm and is widely cited.The latter is the latest algorithm which updates the search direction with the step size and possesses global convergence without Lipschitz continuity.The baseline algorithms possess both good numerical performance and theoretical properties in the existing works.
The experimental environment consists of an Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz 1.80 GHz with 16 GB RAM running on the Windows 11 operating system.

Image restoration
The restoration of noisy images is of great practical importance and is widely used.This subsection uses the TT-TR-WP, TT-TR-CG and baseline algorithms to restore noisy images to test their numerical performance, in which three figures are chosen because they are widely used and classical test figures, see Refs. 24,25.
The objective function and experimental settings are described as follows: The candidate noise index set is denoted as N, the objective function as ω(u) , and the edge-preserving function as χ .The true image containing K × L pixels is denoted as x.For a more detailed explanation of image restoration, please refer to Refs. 3,24,25,28.
where MSE is the mean square error between the original image and processed image and num is the number of bits.
T h e s t o p r u l e o f a l g o r i t h m i s < ε , a n d t h e p a r a m e t e r s a r e δ = 0.2, τ = 0.895, σ = 0.1, µ = 0.1, ε = 10 −6 .
In restoring noisy gray-scale images, from Table 2, we can conclude that TT-TR-WP exhibits the best numerical performance in terms of running time, TT-TR-CG is the second best, MPRP is third, and A-T-PRP-A is the slowest.Furthermore, if we set the performance of TT-TR-WP as the standard, then TT-TR-CG (21)    4 further demonstrate that all algorithms obtain highly similar SSIM and PSNR values.Combining the above discussion, we can make a conclusion: to obtain highly similar results, TT-TR-WP and TT-TR-CG perform relatively well and the proposed algorithms are competitive.In summary, TT-TR-WP exhibits impressive numerical performance, and TT-TR-CG is highly competitive with the others.To save space, this paper only records numerical results but abandons the display of figures obtained by diverse algorithms with noise ratios of 70%, and 90%, see Fig. 1.In each row, the first column is obtained by TT-TR-WP, the second column by TT-TR-CG, the third column by A-T-PRP-A, and the last column by MPRP.

Color image restoration
To further evaluate the performance of the objective algorithms, this section applies various algorithms to restore color images with different levels of noise.Peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) and Mean Squared Error (MSE) are widely used measurements for image quality assessment and are used in this section.To save space, this paper only records numerical results but abandons the display of figures obtained by diverse algorithms with noise ratios of 20%, 60% and 80%.The stop rule of algorithm is < ε , and the parameters are δ = 0.0885, τ = 0.885, σ = 0.0015, µ = 1.1555, ε = 10 −4 .In Table 5, the total running time of four algorithms is 73.83, 74.88, 80.02, 74.52 s, respectively.Additionally, from Tables 6, 7, 8, the PSNR, MSE, and SSIM of algorithms are highly similar, but object algorithms are relatively competitive.The images restored by various algorithms under different noise ratios are presented in Fig. 2 that corresponds to noise ratio 40%.In each row, the first column is obtained by TT-TR-WP, the second column by TT-TR-CG, the third column by A-T-PRP-A, and the last column by MPRP.Table 3.The ratio of total running time comparing with TT-TR-WP.www.nature.com/scientificreports/

General unconstrained optimization
To further test the numerical performance, this subsection applies the algorithms to solve large-scale unconstrained optimization problems.Sixty-five classical functions are randomly selected from 2 , as shown in Table 9, with dimensions of 3000, 6000, and 12,000.The stopping criterion is g(x k ) < ε or NI > 8000 , where NI is the iteration number, and g(x k ) is the gradient value at the point x k .The parameters used are δ = 0.2, τ = 0.9, σ = 0.001, µ = 0.1, ε = 10 −6 .The running time in seconds is used as the reference standard for evaluating numerical performance, as shown in Table 10.The relative numerical performance of solving large-scale problems is illustrated in Fig. 3, in which the red line denotes TT-TR-WP, black line denotes TT-TR-CG, blue line denotes A-T-PRP-A, and the other denotes MPRP.TT-TR-WP has a high initial value, which means that possesses relatively good robustness.TT-TR-CG exhibits gradually increase trend all time which means that possesses relatively good applicability.TT-TR-WP and TT-TR-CG both possess relatively good robustness and applicability than the others.
In summary, TT-TR-WP and TT-TR-CG possess relatively good numerical performance than baseline algorithms, in terms of applicability and robustness, in which TT-TR-WP has the best robustness and relatively good applicability and TT-TR-CG is the opposite.

Conclusion
This paper introduces two three-term trust region conjugate gradient algorithms, TT-TR-WP and TT-TR-CG, which are capable of converging under non-Lipschitz continuous gradient functions without any additional conditions.These algorithms possess sufficient descent and trust region properties, and demonstrate global convergence.In order to assess their numerical performance, we compare them with two classical algorithms in terms of restoring noisy gray-scale and color images as well as solving large-scale unconstrained problems.To obtain highly similar SSIM and PSNR values in noisy gray-scale images, TT-TR-WP exhibits the best numerical performance in terms of running time, TT-TR-CG is the second best, MPRP is third, and A-T-PRP-A is the slowest.Furthermore, if we set the performance of TT-TR-WP as the standard, then TT-TR-CG takes around 2.34 times longer.The other algorithms take around 2.46 and 2.42 times longer, respectively.In solving the same color images, the proposed algorithms exhibit relative good performance over other algorithms.Additionally, in comparative experiments of algorithm performance, the curve of TT-TR-CG has the maximum initial value, while the curve of TT-TR-WP is the second-best, indicating that TT-TR-CG and TT-TR-WP are relatively more robustness and have high stability when facing diverse situations.In summary, TT-TR-WP and TT-TR-CG exhibit relatively better performance in terms of applicability and robustness.www.nature.com/scientificreports/

3 2
sin( 1 x ) for x ∈ (0, 1].(ii) The global convergence of TR-TR-WP is established under the weak Wolfe-Powell linear search technique and gradient function non-Lipschitz continuity.(iii) The sufficient descent and trust region properties, (

Theorem 4 . 1
If sequences {x k , d k , α k , g k } are generated by TT-TR-CG, then, following formula holds Proof The proof is similar with the "The global convergence of TT-TR-WP", then completes the proof.

Figure 1 .
Figure 1.From left to right, the images disturbed by 50% salt-and-pepper noise, the images restored by TT-TR-WP (first column), TT-TR-CG (second column), A-T-PRP-A (third column) and MPRP (last column), respectively.

Figure 2 .
Figure 2. From left to right, the images disturbed by 40% salt-and-pepper noise, the images restored by TT-TR-WP (first column), TT-TR-CG (second column), A-T-PRP-A (third column) and MPRP (last column), respectively.

Figure 3 .
Figure 3.The running time of diverse algorithms on tested problems.

Table 2 .
The running time under different noise ratios with diverse algorithms.

Table 4 .
The SSIM and PSNR under different noise ratios with diverse algorithms.

Table 5 .
The running time with different noise ratios across various algorithms.

Table 6 .
The PSNR with different noise ratios across various algorithms.

Table 7 .
The MSE with different noise ratios across various algorithms.

Table 8 .
The SSIM with different noise ratios across various algorithms (s).

Table 10 .
The running time of diverse algorithms on tested problems.